Serveur d'exploration sur l'OCR

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Document modeling for form class identification

Identifieur interne : 002520 ( Main/Exploration ); précédent : 002519; suivant : 002521

Document modeling for form class identification

Auteurs : Sébastien Diana [France] ; Eric Trupin [France] ; Yves Lecourtier [France] ; Jacques Labiche [France]

Source :

RBID : ISTEX:68BF6D86950B7FFC13B84F81AD5A13828D4551FE

Abstract

Abstract: This article deals with the description of a document system analysis based on document modeling. This system is applied to forms which are used by the CAF, the French national family allowance Department -Caisse d 'Allocations Familiales. The system is composed by three different modules which deals with the different form processes. The first module - low-level processing - is divided into three stages : acquisition, binarisation and skew correction. These stages allow the transformation of a paper form into an image with correct qualities. The second module - document structuration - processes this image to extract the information contained in the form. The information is arranged to obtain a tree. This tree shows the organisation of the form content into a hierarchical way. In addition to the tree extraction, the document structuration module allows the creation of a form model base. The last module -form class identification - uses the tree and the form model base. It is composed with two pre-classifiers to extract possible lists of forms and a structural classifier. The two pre-classifiers filter the form classes among the 250 classes in order to reduce the treatment of the classifier. This classifier is based on graph matching to compare the tree of the particular form and the possible list of form extracted during the two pre-classifiers.

Url:
DOI: 10.1007/3-540-63791-5_13


Affiliations:


Links toward previous steps (curation, corpus...)


Le document en format XML

<record>
<TEI wicri:istexFullTextTei="biblStruct">
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Document modeling for form class identification</title>
<author>
<name sortKey="Diana, Sebastien" sort="Diana, Sebastien" uniqKey="Diana S" first="Sébastien" last="Diana">Sébastien Diana</name>
</author>
<author>
<name sortKey="Trupin, Eric" sort="Trupin, Eric" uniqKey="Trupin E" first="Eric" last="Trupin">Eric Trupin</name>
</author>
<author>
<name sortKey="Lecourtier, Yves" sort="Lecourtier, Yves" uniqKey="Lecourtier Y" first="Yves" last="Lecourtier">Yves Lecourtier</name>
</author>
<author>
<name sortKey="Labiche, Jacques" sort="Labiche, Jacques" uniqKey="Labiche J" first="Jacques" last="Labiche">Jacques Labiche</name>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:68BF6D86950B7FFC13B84F81AD5A13828D4551FE</idno>
<date when="1997" year="1997">1997</date>
<idno type="doi">10.1007/3-540-63791-5_13</idno>
<idno type="url">https://api.istex.fr/document/68BF6D86950B7FFC13B84F81AD5A13828D4551FE/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">001854</idno>
<idno type="wicri:Area/Istex/Curation">001756</idno>
<idno type="wicri:Area/Istex/Checkpoint">001972</idno>
<idno type="wicri:doubleKey">0302-9743:1997:Diana S:document:modeling:for</idno>
<idno type="wicri:Area/Main/Merge">002651</idno>
<idno type="wicri:Area/Main/Curation">002520</idno>
<idno type="wicri:Area/Main/Exploration">002520</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title level="a" type="main" xml:lang="en">Document modeling for form class identification</title>
<author>
<name sortKey="Diana, Sebastien" sort="Diana, Sebastien" uniqKey="Diana S" first="Sébastien" last="Diana">Sébastien Diana</name>
<affiliation wicri:level="4">
<country xml:lang="fr">France</country>
<wicri:regionArea>Laboratoire PSI / La3I, Université de Rouen, 76821, Mont Saint Aignan Cédex</wicri:regionArea>
<placeName>
<region type="region" nuts="2">Région Normandie</region>
<region type="old region" nuts="2">Haute-Normandie</region>
<settlement type="city">Mont Saint Aignan Cédex</settlement>
</placeName>
<orgName type="university">Université de Rouen</orgName>
</affiliation>
</author>
<author>
<name sortKey="Trupin, Eric" sort="Trupin, Eric" uniqKey="Trupin E" first="Eric" last="Trupin">Eric Trupin</name>
<affiliation wicri:level="4">
<country xml:lang="fr">France</country>
<wicri:regionArea>Laboratoire PSI / La3I, Université de Rouen, 76821, Mont Saint Aignan Cédex</wicri:regionArea>
<placeName>
<region type="region" nuts="2">Région Normandie</region>
<region type="old region" nuts="2">Haute-Normandie</region>
<settlement type="city">Mont Saint Aignan Cédex</settlement>
</placeName>
<orgName type="university">Université de Rouen</orgName>
</affiliation>
</author>
<author>
<name sortKey="Lecourtier, Yves" sort="Lecourtier, Yves" uniqKey="Lecourtier Y" first="Yves" last="Lecourtier">Yves Lecourtier</name>
<affiliation wicri:level="4">
<country xml:lang="fr">France</country>
<wicri:regionArea>Laboratoire PSI / La3I, Université de Rouen, 76821, Mont Saint Aignan Cédex</wicri:regionArea>
<placeName>
<region type="region" nuts="2">Région Normandie</region>
<region type="old region" nuts="2">Haute-Normandie</region>
<settlement type="city">Mont Saint Aignan Cédex</settlement>
</placeName>
<orgName type="university">Université de Rouen</orgName>
</affiliation>
</author>
<author>
<name sortKey="Labiche, Jacques" sort="Labiche, Jacques" uniqKey="Labiche J" first="Jacques" last="Labiche">Jacques Labiche</name>
<affiliation wicri:level="3">
<country xml:lang="fr">France</country>
<wicri:regionArea>Laboratoire ISMRA / LACP, Université de Caen, 14050, Caen Cédex</wicri:regionArea>
<placeName>
<region type="region" nuts="2">Région Normandie</region>
<region type="old region" nuts="2">Basse-Normandie</region>
<settlement type="city">Caen Cédex</settlement>
</placeName>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series>
<title level="s">Lecture Notes in Computer Science</title>
<imprint>
<date>1997</date>
</imprint>
<idno type="ISSN">0302-9743</idno>
<idno type="eISSN">1611-3349</idno>
<idno type="ISSN">0302-9743</idno>
</series>
<idno type="istex">68BF6D86950B7FFC13B84F81AD5A13828D4551FE</idno>
<idno type="DOI">10.1007/3-540-63791-5_13</idno>
<idno type="ChapterID">13</idno>
<idno type="ChapterID">Chap13</idno>
</biblStruct>
</sourceDesc>
<seriesStmt>
<idno type="ISSN">0302-9743</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass></textClass>
<langUsage>
<language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">Abstract: This article deals with the description of a document system analysis based on document modeling. This system is applied to forms which are used by the CAF, the French national family allowance Department -Caisse d 'Allocations Familiales. The system is composed by three different modules which deals with the different form processes. The first module - low-level processing - is divided into three stages : acquisition, binarisation and skew correction. These stages allow the transformation of a paper form into an image with correct qualities. The second module - document structuration - processes this image to extract the information contained in the form. The information is arranged to obtain a tree. This tree shows the organisation of the form content into a hierarchical way. In addition to the tree extraction, the document structuration module allows the creation of a form model base. The last module -form class identification - uses the tree and the form model base. It is composed with two pre-classifiers to extract possible lists of forms and a structural classifier. The two pre-classifiers filter the form classes among the 250 classes in order to reduce the treatment of the classifier. This classifier is based on graph matching to compare the tree of the particular form and the possible list of form extracted during the two pre-classifiers.</div>
</front>
</TEI>
<affiliations>
<list>
<country>
<li>France</li>
</country>
<region>
<li>Basse-Normandie</li>
<li>Haute-Normandie</li>
<li>Région Normandie</li>
</region>
<settlement>
<li>Caen Cédex</li>
<li>Mont Saint Aignan Cédex</li>
</settlement>
<orgName>
<li>Université de Rouen</li>
</orgName>
</list>
<tree>
<country name="France">
<region name="Région Normandie">
<name sortKey="Diana, Sebastien" sort="Diana, Sebastien" uniqKey="Diana S" first="Sébastien" last="Diana">Sébastien Diana</name>
</region>
<name sortKey="Labiche, Jacques" sort="Labiche, Jacques" uniqKey="Labiche J" first="Jacques" last="Labiche">Jacques Labiche</name>
<name sortKey="Lecourtier, Yves" sort="Lecourtier, Yves" uniqKey="Lecourtier Y" first="Yves" last="Lecourtier">Yves Lecourtier</name>
<name sortKey="Trupin, Eric" sort="Trupin, Eric" uniqKey="Trupin E" first="Eric" last="Trupin">Eric Trupin</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 002520 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 002520 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     ISTEX:68BF6D86950B7FFC13B84F81AD5A13828D4551FE
   |texte=   Document modeling for form class identification
}}

Wicri

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024